extend `nonzero` to int64 #125850

bhack · 2024-05-09T16:23:38Z

Fixes #51871

pytorch-bot · 2024-05-09T16:23:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125850

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 3 Unrelated Failures

As of commit a46722a with merge base 8f30f36 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-cuda12.1-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
/var/lib/jenkins/workspace/BUILD.bazel:430:11: Compiling aten/src/ATen/native/cuda/Nonzero.cu failed: (Exit 1): nvcc failed: error executing command (from target //:aten_cuda) external/local_cuda/cuda/bin/nvcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++11' -MD -MF ... (remaining 332 arguments skipped)
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / build (gh)
Process completed with exit code 1.
pull / linux-focal-rocm6.1-py3.8 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/hip/Nonzero.hip:46:9: error: 'InputIteratorT' does not refer to a value

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda11.8-py3.10-gcc9 / build (gh) (similar failure)
Process completed with exit code 1.
pull / linux-focal-cuda12.1-py3.10-gcc9 / build (gh) (similar failure)
Process completed with exit code 1.
pull / linux-jammy-cuda11.8-cudnn8-py3.8-clang12 / build (gh) (similar failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bhack · 2024-05-09T16:42:51Z

/cc @ezyang @eqy This is an explorative blackbox PR as I don't have free cuda resources right now and we don't have a quick way to setup the env to contribute sparse c++/cuda PR (see #125297).

But I made this editable on your side in the case you have the env ready and a quick fix is enough.

linux-foundation-easycla · 2024-05-10T15:07:52Z

The committers listed above are authorized under a signed CLA.

✅ login: bhack (0841181, bb7240c, a47573b, 637532c, 65dbb6c, 8cce2c7, 79ad3d3, 052bb61, a2554de, 679b9fd, dc29c0f, 5332381, a46722a, 2362079, 58eeb41, faa531f)

eqy

Should we add a (presumably large tensor) test for this?

bhack · 2024-05-10T18:27:15Z

Should we add a (presumably large tensor) test for this?

Do we had an INT_MAX test already somewhere that we could expand?

eqy · 2024-05-10T18:49:58Z

Unfortunately these are not really unified at the moment, but this should surface some examples: https://github.com/search?q=repo%3Apytorch%2Fpytorch+64bit+language%3APython+path%3A%2F%5Etest%5C%2F%2F&type=code

bhack · 2024-05-10T18:58:07Z

As we don't have a specific CUDA test do we want to find a workaround from python?

Can you suggest one from grep -R torch.nonzero test/?

bhack · 2024-05-10T21:42:31Z

I think I am going to close this as cub::DispatchSelectIf probably it will be slower then cub::DeviceSelect::Flagged we are currently using.

Probably we need to wait upstream for NVIDIA/cccl#1422

What do you think?

bhack · 2024-05-11T13:02:37Z

@ezyang Do you think we can we open a new ticket to lower this with Trition where and sum?
https://github.com/pytorch/pytorch/blob/a174c536f8f32b41da9efa647e364196423468a5/torch/_inductor/lowering.py#L2187C20-L2187C35

Edit:
The ticket is at #126003

bhack · 2024-05-11T15:53:25Z

aten/src/ATen/native/cuda/Nonzero.cu

+    using flag_iterator_t = cub::NullType*;
+    using equality_op_t   = cub::NullType;
+
+    return cub::DispatchSelectIf<


Does this requries cub/cccl 2.4.0?

ezyang · 2024-05-11T21:08:36Z

yes, need a big tensor test. @eqy's link is good for examples

bhack · 2024-05-11T21:42:37Z

Ok thanks,
so I am going to close it as I don't have the env and currently spare GPU computing to write a brand new test and recompile it.
At least if we don't identify another python test that it is already indirectly using nonzero and that we could modify it with a big input.

bhack · 2024-05-11T21:55:49Z

Just to check if it could compile at least with the current CUB version.
Do you know what is this CI failure?

/usr/local/cuda/include/cub/agent/agent_select_if.cuh(264): error: function "at::native::<unnamed>::NonZeroOp<T>::operator() [with T=c10::complex<c10::Half>]" cannot be called with the given argument list
            argument types are: (int64_t)
            object type is: at::native::<unnamed>::NonZeroOp<c10::complex<c10::Half>>
                  selection_flags[ITEM] = select_op(items[ITEM]);

bhack · 2024-05-13T14:43:53Z

I think we need cub/cub/agent/agent_select_if.cuh changes introduced at NVIDIA/cccl#1379

So this mean that we need to wait for the next cuda 12.4 update and make it also conditional.

ezyang · 2024-05-14T04:10:17Z

This PR seems fine. I agree you may need to preprocessor your way to victory. CI will say.

bhack · 2024-05-23T13:31:17Z

@ezyang the new CUDA 12.5 delivers CUB 2.4.0 so it could be enough for this workaround.

ezyang · 2024-05-28T20:21:30Z

if you're willing to wait for cuda 12.5 :)

bhack · 2024-05-28T21:03:58Z

if you're willing to wait for cuda 12.5 :)

This version is required for the workaround API. A full upstream solution it will require to wait more CUDA releases.

ezyang · 2024-05-29T23:31:57Z

Sorry, I'm not sure I understand the state of this PR. I would happily accept a PR that makes nonzero int64 work on CUDA 12.5 or later, subject to the requirement that this functionality is preprocessored out. If you don't mind waiting, we can also ice this PR until CUDA 12.6 shows up by default and then we can just land it as is.

bhack · 2024-05-30T00:06:12Z

The current status for testing/using the workaround in this PR is to have CUB 2.4.0.
As in pytorch we currently use CUB from the official CUDA distribution it means we need upgrade pytorch to support CUDA 12.5 (any plan?).

For a "regular" upstream solution we need that this will be merged:
NVIDIA/cccl#1422

And after that we need an official CUB release and that release will be included in a CUDA release.
So we have no guaranties that we will have an upstream solution with CUDA 12.6

bhack · 2024-05-30T14:36:50Z

@ezyang See NVIDIA/cccl#1454 (comment)

ezyang · 2024-05-31T01:50:19Z

@atalman for CUDA 12.5

extend nonzero to int64

65dbb6c

pytorch-bot bot added the release notes: cuda release notes category label May 9, 2024

Merge branch 'main' into nonzero

bb7240c

pytorchbot added the open source label May 9, 2024

bhack added 2 commits May 9, 2024 18:33

Merge branch 'pytorch:main' into nonzero

79ad3d3

Reintroduce inline comments

8cce2c7

bhack marked this pull request as ready for review May 9, 2024 16:52

bhack requested a review from eqy as a code owner May 9, 2024 16:52

bhack added 3 commits May 9, 2024 21:53

Remove macro

a2554de

Remove typo

2362079

Refactor

0841181

bhack force-pushed the nonzero branch from 9a9543f to 2655143 Compare May 10, 2024 15:31

Fix typo

a47573b

bhack force-pushed the nonzero branch from 2655143 to a47573b Compare May 10, 2024 15:38

eqy reviewed May 10, 2024

View reviewed changes

Add complex template

dc29c0f

bhack added 3 commits May 11, 2024 00:18

Reformat code

58eeb41

Add extra comments

052bb61

Format TODO

faa531f

bhack mentioned this pull request May 11, 2024

Compile/Lowering nonzero #126003

Closed

bhack commented May 11, 2024

View reviewed changes

bhack added 2 commits May 13, 2024 13:15

Add bool op for complex

637532c

Merge branch 'pytorch:main' into nonzero

5332381

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 14, 2024

ezyang self-requested a review May 14, 2024 18:02

bhack added 2 commits May 15, 2024 00:37

Merge branch 'pytorch:main' into nonzero

679b9fd

Merge branch 'pytorch:main' into nonzero

a46722a

bhack mentioned this pull request May 30, 2024

CUDA 12.5 #127532

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend `nonzero` to int64 #125850

extend `nonzero` to int64 #125850

bhack commented May 9, 2024

pytorch-bot bot commented May 9, 2024 •

edited

Loading

bhack commented May 9, 2024

linux-foundation-easycla bot commented May 10, 2024 •

edited

Loading

eqy left a comment

bhack commented May 10, 2024 •

edited

Loading

eqy commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 11, 2024 •

edited

Loading

bhack May 11, 2024 •

edited

Loading

ezyang commented May 11, 2024

bhack commented May 11, 2024

bhack commented May 11, 2024 •

edited

Loading

bhack commented May 13, 2024

ezyang commented May 14, 2024

bhack commented May 23, 2024 •

edited

Loading

ezyang commented May 28, 2024

bhack commented May 28, 2024

ezyang commented May 29, 2024

bhack commented May 30, 2024

bhack commented May 30, 2024

ezyang commented May 31, 2024

extend nonzero to int64 #125850

Are you sure you want to change the base?

extend nonzero to int64 #125850

Conversation

bhack commented May 9, 2024

pytorch-bot bot commented May 9, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125850

❌ 3 New Failures, 3 Unrelated Failures

bhack commented May 9, 2024

linux-foundation-easycla bot commented May 10, 2024 • edited Loading

eqy left a comment

Choose a reason for hiding this comment

bhack commented May 10, 2024 • edited Loading

eqy commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 10, 2024

bhack commented May 11, 2024 • edited Loading

bhack May 11, 2024 • edited Loading

Choose a reason for hiding this comment

ezyang commented May 11, 2024

bhack commented May 11, 2024

bhack commented May 11, 2024 • edited Loading

bhack commented May 13, 2024

ezyang commented May 14, 2024

bhack commented May 23, 2024 • edited Loading

ezyang commented May 28, 2024

bhack commented May 28, 2024

ezyang commented May 29, 2024

bhack commented May 30, 2024

bhack commented May 30, 2024

ezyang commented May 31, 2024

extend `nonzero` to int64 #125850

extend `nonzero` to int64 #125850

pytorch-bot bot commented May 9, 2024 •

edited

Loading

linux-foundation-easycla bot commented May 10, 2024 •

edited

Loading

bhack commented May 10, 2024 •

edited

Loading

bhack commented May 11, 2024 •

edited

Loading

bhack May 11, 2024 •

edited

Loading

bhack commented May 11, 2024 •

edited

Loading

bhack commented May 23, 2024 •

edited

Loading